Giving AI Agents Memory: Failure Memory (SCARs) vs. Solution Recall

A stateless agent is a goldfish. It fails, you fix the prompt, and next session it makes the identical mistake because it remembers nothing. "AI agent memory" is one of the hottest topics of 2026, and almost all of it is about the same thing: storing past successes so the agent sounds smarter and stays consistent.

That's useful, but I think it's the smaller half. The bigger win in production is the opposite: storing past failures so the agent stops repeating them. Two complementary engines, both extracted from 18 months of running agents in production.

Half 1: Failure Memory (give the agent scars)

Agents get stuck in loops — same error, same response, same failure, repeat — burning tokens and money each turn. Agent-Scars breaks the cycle with a three-beat system:

Record the incident when the agent fails: scar.recordIncident('syntax_error', 'coder', 'openai', 400, message)
Detect a pattern — the same error class repeated N times in a session.
Guard — inject a warning block into the next prompt before the agent acts.

The guard block is the whole trick. On the third repeat of an error, this gets prepended to the prompt:

            The repeat-failure guard (injected into the next prompt)
            
### !!! REPEAT FAILURE GUARD — DO NOT REPEAT THESE ERRORS !!!
The following errors have occurred MULTIPLE times in this session.
Prioritize fixing them:

▶ Pattern: SYNTAX COMPLIANCE ERROR (Failed 2 times)
  Fix Instruction: Output must be valid syntax. Verify braces,
  brackets, parentheses, and commas. Never truncate mid-block.

### !!! END REPEAT FAILURE GUARD !!!
            
        

The model reads its own scar and changes behavior. Incidents persist to SQLite (with a JSON fallback) keyed by workspace and project, so the memory survives across sessions and stays isolated per tenant.

Why failures beat successes A remembered success makes one task slightly better. A remembered failure prevents a whole category of expensive, repeated mistakes — including the infinite-retry loops that quietly run up your bill. Negative memory has higher leverage.

Half 2: Solution Recall (don't rebuild what you built last week)

The positive counterpart: when a new task arrives, has the agent solved something similar before? Agent-Recall stores each completed solution — goal, approach, outcome, confidence — and finds the closest prior match, injecting it into the prompt before the agent starts:

            The recall block (injected before the agent starts)
            
### RECALL: Similar Task Found in Memory
Similarity: 68%
Prior Goal: Write a function to parse CSV files in Node.js
How it was solved: Used fs.readFileSync, split by newlines, then
by comma. Handled quoted fields with regex.
Outcome: Working CSV parser, 40 lines, handles edge cases
Confidence: 90%
Consider this approach before starting from scratch.
### END RECALL
            
        

Matching uses keyword-overlap similarity — deliberately simple, zero dependencies, no embedding service required to start. You can upgrade to vector similarity later, but the cheap version already kills most "reinvent the wheel" waste.

The Two Halves Together

Scars

negative memory — "you broke this before, here's the fix"

Recall

positive memory — "you solved this before, reuse the approach"

Inject

both work by prepending a compact block to the next prompt

Notice neither needs a fancy memory framework. Memory, at its useful core, is just: persist structured records of what happened, retrieve the relevant ones, and inject them as context before the model acts. That's the same discipline as context engineering — and pairs naturally with a grounded retrieval pipeline.

What I Built

Agent-Scars (failure memory) and Agent-Recall (solution memory) both run offline in mock mode, no API key needed. They're two of the boring, high-leverage pieces I argue for in the boring infrastructure that actually ships.